Setting up project folders and libraries

This section outlines the initial setup. It involves loading specific R functions, setting up necessary libraries, and organizing data directories to streamline file management and data processing.

Loading necessary functions. This code chunk is responsible for importing custom R functions necessary for processing and analyzing ATAC-seq data. These functions are stored externally and are essential for the specific analyses that will be conducted in subsequent steps.

# Load functions from a script located in the working directory for downstream analysis
source(paste0(wd,"/atac_downstrem_analysis_functions.R"))

Loading required libraries. A suite of R packages is loaded here, enabling data manipulation, visualization, and advanced statistical analysis

# Load general purpose libraries necessary for data manipulation, visualization, and string operations
library(tidyverse)  # Comprehensive collection of data manipulation and visualization tools
library(stringr)    # Simplifies string operations
library(readr)      # Efficiently reads and writes tabular data
library(ggplot2)    # Creates elegant data visualisations using the grammar of graphics
library(scales)     # For scaling functions for visualization
library(tibble)     # Provides a modern rethinking of data frames

# Load specific libraries for genomics data analysis
library(edgeR)      # For differential expression analysis
library(ggrepel)    # Helps with non-overlapping text labels on plots
library(dendextend) # For enhancing dendrogram objects
library(EnhancedVolcano) # For creating enhanced volcano plots
library(topGO) #To perform GO enrichment analyisis

The input data directories are systematically organized to ensure easy access and management of data files necessary for the analysis:

  • 0_EXPERIMENT_INFO: Contains metadata about the experiments, such as sample details, experimental conditions…
  • 1_COUNTS: Stores count matrices.
  • 2_PEAK_INFO: Includes information about genomic regions of open chromatin identified in the ATAC-seq data.
  • 3_OTHER_SP_DATA: Holds comparative ATACseq data from other species that will be used for cross-species analysis. This data comes from publicly available data.
  • 4_NATCOM2020_INFO: Specific mayfly genomic information coming from Almudi et al., 2020.
  • 5_RNA-SEQ: Directory for storing RNA-seq data files.

We save the paths to this folders in variables to acces them during all the analysis

info  <- paste0(wd,"0_EXPERIMENT_INFO/")
counts  <- paste0(wd,"1_COUNTS/")
peak_info <- paste0(wd,"2_PEAK_INFO/")
sp_data <- paste0(wd,"3_OTHER_SP_DATA/")
natcom2020 <- paste0(wd,"4_NATCOM2020_INFO/")
rnaseq <- paste0(wd,"5_RNA-SEQ/")

The output data structure is geared towards organizing the results of the analysis into logical and functional directories, making it easier to locate and use the data in subsequent steps or for final reporting.

  • A2_DIFFERENTIAL_PEAK: This is the main folder designated for storing all outputs related to differential peak analysis. It acts as a central hub for aggregating analytical results. This will be designed as our analysis folder.
  • output_images: A subfolder created within the main analysis folder to store all graphical representations produced during the analysis.
  • output_files: This directory is reserved for statistical outputs, such as tables.
analysis_folder <- paste0(wd,"A2_DIFF_CHROMATIN_ACCESSIBILITY/")
imgdir  <- paste0(analysis_folder,"/output_images/")
statdir <- paste0(analysis_folder,"/output_files/")

Importing and organizing data

C. dipterum ATAC data

To initiate our analysis of ATAC-seq data for C. dipterum, we first import the sample information. This data includes crucial details for each sample, which are essential for subsequent analyses:

samples_info <- data.frame(read_delim(paste0(info,"info_samples.csv"),
                           delim = ",", 
                           escape_double = FALSE, 
                           trim_ws = TRUE))
samples_info
##    stage                name replicate sample
## 1   emb4  embryo_stage4_rep1         1   e4_1
## 2   emb4  embryo_stage4_rep2         2   e4_2
## 3   emb6  embryo_stage6_rep1         1   e6_1
## 4   emb6  embryo_stage6_rep6         2   e6_2
## 5   emb8  embryo_stage8_rep1         1   e8_1
## 6   emb8  embryo_stage8_rep2         2   e8_2
## 7  emb10 embryo_stage10_rep1         1  e10_1
## 8  emb10 embryo_stage10_rep2         2  e10_2
## 9  emb12 embryo_stage12_rep1         1  e12_1
## 10 emb12 embryo_stage12_rep2         2  e12_2
## 11 emb14 embryo_stage14_rep1         1  e14_1
## 12 emb14 embryo_stage14_rep2         2  e14_2

As our focus is on dynamic peaks identified in a previous analysis (see Downstream analysis 1:Quantitative mapping of open chromatin regions), we proceed by loading normalized count data associated with these peaks:

dynamic_norm_counts_df <- data.frame(read_delim(paste0(counts,"dynamic_peaks_sample_norm.tsv"),
                                                        delim = "\t", escape_double = FALSE,
                                                        trim_ws = TRUE))

Next, we import detailed peak information that links genomic zones and genes, facilitating deeper biological insights:

cdip_peak_zone_associated_gene <- data.frame(read_delim(paste0(peak_info,"/clodip_embryoATAC_peaks_and_genes.tsv"),
                                                        delim = "\t", escape_double = FALSE,
                                                        trim_ws = TRUE))
cdip_peak_zone_associated_gene <- cdip_peak_zone_associated_gene %>% dplyr::rename(peak_id = peak)
rownames(cdip_peak_zone_associated_gene) <- cdip_peak_zone_associated_gene$peak_id
head(cdip_peak_zone_associated_gene)
##       peak_id promoter    proximal   gene_body
## peak1   peak1     <NA>        <NA> clodip_v3_1
## peak2   peak2     <NA>        <NA> clodip_v3_1
## peak3   peak3     <NA>        <NA>        <NA>
## peak4   peak4     <NA>        <NA>        <NA>
## peak5   peak5     <NA> clodip_v3_4        <NA>
## peak6   peak6     <NA>        <NA> clodip_v3_4
cdip_peak_unique_zone<- data.frame(read_delim(paste0(peak_info,"/clodip_embryoATAC_peaks_and_zones.tsv"),
                                              delim = "\t", escape_double = FALSE,
                                              trim_ws = TRUE))

cdip_peak_unique_zone <- cdip_peak_unique_zone %>% dplyr::rename(peak_id = peak)
rownames(cdip_peak_unique_zone) <- cdip_peak_unique_zone$peak_id
head(cdip_peak_unique_zone)
##       peak_id     zone
## peak1   peak1 genebody
## peak2   peak2 genebody
## peak3   peak3   distal
## peak4   peak4   distal
## peak5   peak5 proximal
## peak6   peak6 genebody

Further, we load the gene associations for each peak, which are critical for functional genomic analysis:

cdip_peak_associated_gene <- read_delim(paste0(peak_info,"/clodip_embryoATAC_peak_to_gene.tsv"), 
                             delim = "\t", escape_double = FALSE, 
                             trim_ws = TRUE)
head(cdip_peak_associated_gene)
## # A tibble: 6 × 3
##   peak_id clodip_v3   zone    
##   <chr>   <chr>       <chr>   
## 1 peak1   clodip_v3_1 genebody
## 2 peak2   clodip_v3_1 genebody
## 3 peak3   <NA>        distal  
## 4 peak4   <NA>        distal  
## 5 peak5   clodip_v3_4 proximal
## 6 peak6   clodip_v3_4 genebody

Gene Onthology Data

For the analysis of C. dipterum, we incorporate a new gene annotation version (clodip_v3). To align this with other publicly available datasets that may use different annotations, we load a transcript equivalency table. This table, based on BLAST results, helps in correlating transcripts from the older gene models to the new annotation.

cd_to_v3_transcripts <- read_table(paste0(natcom2020,"Cdip_ah2p_vs_clodip_v3_ids.txt"),
                       col_names = FALSE)
colnames(cd_to_v3_transcripts) <- c("Cdip_ah2p", "clodip_v3")

Lastly, we retrieve detailed gene equivalency data, which allows us to map genes from different versions effectively:

cd_to_v3_genes <- read_delim(paste0(natcom2020,"Cdip_ah2p_vs_clodip_v3_ids_gene.txt"), 
                             delim = "\t", escape_double = FALSE, 
                             trim_ws = TRUE)

This block reads a CSV file containing gene identifiers and their associated GO terms. Several columns are skipped because they are not required for the subsequent analysis, focusing only on the columns that map Cdip_ah2p gene identifiers to go_id. This data comes from Uniprot.

cd_go <- read_csv(paste0(natcom2020,"CD_uniprotF.csv"),
                  col_types = cols(...1 = col_skip(), 
                                   Entry = col_skip(),
                                   Names = col_skip(), 
                                   Protein.names = col_skip()))
colnames(cd_go) <- c("Cdip_ah2p", "go_id")

This section of code merges the cd_go data with another dataset that contains mappings between Cdip_ah2p and the latest gene version identifiers (clodip_v3).

v3_go <- inner_join(cd_to_v3_genes, cd_go, by = "Cdip_ah2p") %>%
  dplyr::select(clodip_v3, go_id) %>%
  group_by(clodip_v3) %>%
  summarise(go_id = toString(gsub('"', '', go_id)), .groups = "drop")

Finally, the processed data, which now links the new gene identifiers with their corresponding GO terms, is written to a tab-separated text file. This file will be useful for downstream analyses.

write.table(v3_go, 
            paste0(natcom2020,"/genesv32GO.txt"), 
            sep = "\t", row.names = F, quote = FALSE)

Parameters setup

This section establishes essential variables and aesthetic settings for the analysis. By carefully defining these parameters, we ensure that the analysis is not only rigorous but also that the resulting visualizations are coherent and informative.

Cut off thresholds

First, we set the statistical thresholds, crucial for determining significance in our differential analysis:

# Setting minimum filter threshold
MIN_FILT = 10

Here, we define thresholds for fold change and p-values. A fold change threshold of 20 ensures that only the most substantially different expressions are considered, while a p-value below 0.05 is commonly accepted for statistical significance.

# Defining Fold Change and p-value thresholds for significance in the analysis
MIN_FC <- 2 # Fold Change threshold
MIN_PV <- 0.05 # p-value threshold

Conditions to Study

Defining experimental conditions allows for targeted differential analysis across developmental stages. This block of code categorizes samples into developmental stages, which are crucial for comparing chromatin accessibility changes over time.

# Creating a factor variable for the stages of development, which will be used in differential analysis
condition_factors <- factor(samples_info$stage,
                            levels=c("emb4","emb6","emb8","emb10", "emb12", "emb14"))

# Assigning the factor to a new column in samples_info for easy reference
samples_info$condition <- condition_factors

Aesthetic Parameters

Important: To ensure your analysis runs smoothly and your graphs look exactly as intended, it’s crucial that the names you use in the following lists exactly match those defined in your data files. This consistency is key for linking your data correctly with the aesthetic elements like colors.

Now, let’s define some color parameters to keep our graphs looking consistent and visually appealing:

# Defining color schemes for the conditions, ensuring they match those in samples_info$condition
condition_colors <- c("emb4"="#0072B2",  
                      "emb6"="#009E73",  
                      "emb8"="#D55E00",  
                      "emb10"="#CC79A7", 
                      "emb12"="#F0E442", 
                      "emb14"="#56B4E9")

# Setting colors for individual samples within each condition
sample_colors <- c("e4_1"="#0072B2", 
                   "e4_2"="#72C2FF",  
                   "e6_1"="#009E73",
                   "e6_2"="#72FFB4",  
                   "e8_1"="#D55E00",
                   "e8_2"="#FFA64D", 
                   "e10_1"="#CC79A7",
                   "e10_2"="#FFB3D9", 
                   "e12_1"="#F0E442",
                   "e12_2"="#FFFF99",
                   "e14_1"="#56B4E9",
                   "e14_2"="#99D6FF")

# Color coding for different peak zones
zone_colors <- c("promoter"="#DDC50F", 
                 "proximal"="#228B22",  
                 "genebody"="#4169E1",
                 "distal"="#984EA3")

# Additional color settings for states of the chromatin
state_colors <- c("open"="#48A9A6",
                  "closed"="#C97C5D",
                  "no-sig"="grey")

We also specify the order in which samples and stages should be arranged for visual consistency across figures:

# Ordering of samples for plots
sample_order <- c("e4_1", 
                  "e4_2", 
                  "e6_1", 
                  "e6_2", 
                  "e8_1", 
                  "e8_2", 
                  "e10_1", 
                  "e10_2", 
                  "e12_1", 
                  "e12_2", 
                  "e14_1", 
                  "e14_2")

# Mapping of stages to descriptive names
stage_order <- c("emb4"="Stage 4",
                 "emb6"="Stage 6", 
                 "emb8"="Stage 8", 
                 "emb10"="Stage 10", 
                 "emb12"="Stage 12", 
                 "emb14"="Stage 14")

Exploratory analysis of sample diversity

This section of the analysis provides a visual exploration of the diversity among samples based on their normalized count data from dynamic peaks. The visualizations aim to capture variations in gene expression patterns and chromatin accessibility.

Principal Component Analysis (PCA)

The PCA plot helps to visualize the overall variability among the samples. By reducing the dimensionality of the data, PCA can highlight patterns of similarity and divergence across samples, which may correlate with different developmental stages or experimental conditions.

nom_counts_matrix <- as.matrix(filter_peakid(dynamic_norm_counts_df))
pcaData <- prcomp(nom_counts_matrix, scale=T)
pcaData.df <- as.data.frame(pcaData$rotation)
percentVar <- round(summary(pcaData)$importance[2,]*100,2)

pcaData.df$sample <- as.factor(row.names(pcaData.df))

pca.expr_raw <- ggplot(pcaData.df, aes(x=PC1,y=PC2, color=sample, label=sample)) +
  geom_point(size=3) +
  scale_color_manual(values=sample_colors) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) +
  geom_label_repel(aes(label = sample),
                   box.padding   = 0.35,
                   point.padding = 0.5,
                   segment.color = 'grey50') +
  theme_light() +
  theme(legend.position="none")
print(pca.expr_raw)

save_ggplot(pca.expr_raw, "pca")
## png 
##   2

Here, PC1 and PC2 capture the primary sources of variation, representing significant axes of gene expression changes among samples. This visualization helps in identifying clusters of samples with similar expression profiles. We can see a clear difference between early and late stages.

Hierarchical clustering

Hierarchical clustering provides another approach to examine relationships among samples. It uses correlation-based distance metrics to cluster samples, which can reveal groups based on chromatin accessibility patterns. The dendrogram visualizes the hierarchical relationships, with color coding to reflect the developmental stage of each sample. This analysis, simalry to the PCA, helps identify which samples are more similar to each other, potentially indicating common regulatory mechanisms or stages of development.

dist <- as.dist(1-cor(nom_counts_matrix, method="spearman"))
distclust <- hclust(dist)
dendrogram <- as.dendrogram(distclust, hang=0.1)
labels_colors(dendrogram) <- sample_colors[samples_info$condition][order.dendrogram(dendrogram)]
dend_raw <- function(){plot(dendrogram,
                 horiz = FALSE,
                 yaxt='n', ann=FALSE)}
dend_raw()

save_ggplot(dend_raw(), "dendogram")
## NULL
## NULL
## png 
##   2

Differential chromatin accessibility analysis

This analysis section aims to identify significant changes in chromatin accessibility between developmental stages of C. dipterum, providing insights into the regulatory dynamics that may influence developmental processes.

We begin by specifying a linear model to capture the relationships between conditions and gene expression levels, facilitating the identification of differential peaks across stages. This setup allows us to test specific hypotheses about changes between consecutive developmental stages.

#define the matrix
design_condition   <- model.matrix( ~0+ samples_info$condition)
colnames(design_condition) <- unique(samples_info$condition)
rownames(design_condition) <-samples_info$sample

#define the contrasts and add to the matrix
contrasts= c(paste("emb4","emb6", sep="-"),
             paste("emb6","emb8", sep="-"),
             paste("emb8","emb10", sep="-"),
             paste("emb10","emb12", sep="-"),
             paste("emb12","emb14", sep="-"))
contrasts_matrix <- makeContrasts(contrasts=contrasts, levels=design_condition)

attr(design_condition, "contrasts") <- list(condition = contrasts_matrix)

In the differential expression analysis for C. dipterum, we perform several key statistical steps to identify changes in chromatin accessibility. These steps refine our model to yield reliable insights into regulatory changes across developmental stages.

# Fit the linear model
fit <- lmFit(nom_counts_matrix,
             design_condition)

# Apply the contrasts to the fitted model
fitt <- contrasts.fit(fit, contrasts_matrix)

# Compute eBayes statistics
fitt <- eBayes(fitt, trend=TRUE)

We prepare a detailed table of results to further explore and quantify the changes observed:

make_top_table <- function(fit_model, contrast, MIN_PV=0.05, MIN_FC=1) {
  # Obtain top results from topTable
  top.res.t <- topTable(fit_model, coef=contrast,
                        adjust.method="none", sort.by="B",
                        number=Inf)

  # Directional decisions based on P.Value and logFC
  top.res.t$DIFF <- as.factor(ifelse(top.res.t$P.Value <= MIN_PV,
                                     ifelse(top.res.t$logFC >= MIN_FC, "open",
                                            ifelse(top.res.t$logFC <= -MIN_FC, "closed", "no-sig")),
                                     "no-sig"))
  
  top.res.s <- top.res.t
  top.res.s$peak_id <- rownames(top.res.t
                                )
  # Save results to a file
  file_path <- paste0(statdir, "diff_chromatin_", contrast, ".txt")
  write.table(top.res.s, 
              file = file_path, 
              sep = "\t", 
              quote = FALSE,
              col.names = TRUE, row.names = F)
  return(top.res.t)
}

top_table_list <- lapply(contrasts, 
                         function(x) make_top_table(fitt, x))
names(top_table_list) <- contrasts

The following visualization displays the number of peaks that have become more accessible (opened) or less accessible (closed) between stages, offering a snapshot of dynamic chromatin changes.

de_df <- data.frame(t(sapply(1:length(top_table_list),
                           function(i){table(top_table_list[[i]]$DIFF)})),
                    stg_trans=contrasts) %>%
  pivot_longer(cols = -stg_trans, names_to = "exp", values_to = "n")

filtered_df <- de_df[de_df$exp != "no.sig", ]
filtered_df$stg_trans <- factor(filtered_df$stg_trans, levels = contrasts)
n_diff_plot <- ggplot(filtered_df, aes(x = stg_trans, y = n, color = exp, group = exp, label = n)) +
  geom_line() +
  # geom_point() +
  geom_label_repel(aes(fill = exp), color = "black", size = 3, show.legend = FALSE, segment.size = 0.2) +
  labs(title = "Line Plot of n vs stg_trans",
       x = "Stage transition",
       y = "Number of differentially regulated peaks",
       color = "Chromatin state") +
  scale_color_manual(values = state_colors) +
  scale_fill_manual(values = state_colors) +  # Add fill scale for geom_label_repel
  theme_classic()
print(n_diff_plot)

save_ggplot(n_diff_plot, "num_diff_peaks")
## png 
##   2

Volcano plots enhance our understanding by visually distinguishing between significantly upregulated (opened) and downregulated (closed) peaks, integrating the impact of changes in expression magnitude and statistical significance.

my_volcanoplot <- function(top.res.table, labels_column, contrast_name) {
  #define some values for the caption
  UPDN <- c(nrow(top.res.table[ top.res.table$DIFF == "open", ]),
            nrow(top.res.table[ top.res.table$DIFF == "closed", ]))
  
  #define color
  colorkey <- state_colors[match(levels(top.res.table[["DIFF"]]), names(state_colors))][top.res.table[["DIFF"]]]
  
  EnhancedVolcano(data.frame(top.res.table), x = 'logFC', y = 'P.Value',
                  lab = labels_column,
                  xlab = bquote(~Log[2]~ 'fold change'),
                  ylab = bquote(~-Log[10]~ 'Pvalue'),
                  xlim = c(min(top.res.table$logFC, na.rm = TRUE) - 0.5, max(top.res.table$logFC, na.rm = TRUE) + 0.5),
                  ylim = c(0, max(-log10(top.res.table$P.Value), na.rm = TRUE) + 0.5),
                  pCutoff = MIN_PV, FCcutoff = log2(MIN_FC), pointSize = 1.0, labSize = 2.0,
                  title = "Volcano Plot",
                  subtitle = contrast_name,
                  caption = paste0('log2 FC cutoff: ', MIN_FC, '; p-value cutoff: ',
                                   MIN_PV, '\nTotal = ', nrow(top.res.table),
                                   ' peaks  [ ',UPDN[2],' close ,',UPDN[1],' open ]'),
                  legendPosition = 'bottom', legendLabSize = 14, legendIconSize = 5.0,
                  colCustom = colorkey,
                  drawConnectors = TRUE,
                  widthConnectors = 0.25,
                  max.overlaps=5,
                  colConnectors = 'grey30')
  }
volcano_plot_list <- lapply(1:length(top_table_list),
       function(i) {
         my_volcanoplot(top.res.table=top_table_list[[i]],
                        labels_column=rownames(top_table_list[[i]]),
                        contrast_name=names(top_table_list)[[i]])
       })

print(volcano_plot_list)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

lapply(1:length(volcano_plot_list),
       function(i) {
         contrast = contrasts[[i]]
         save_ggplot(volcano_plot_list[[i]], 
                     paste0("volcano_plot_", contrast))
         })
## [[1]]
## png 
##   2 
## 
## [[2]]
## png 
##   2 
## 
## [[3]]
## png 
##   2 
## 
## [[4]]
## png 
##   2 
## 
## [[5]]
## png 
##   2

Gene Ontology (GO) enrichment analysis

Prepare the genes to use as database (all cdip genes)

geneID2GO <- readMappings(paste0(natcom2020,"/genesv32GO.txt"))
geneID <- names(geneID2GO)

This function handles data extraction, gene retrieval, GO data preparation, enrichment testing using the elim algorithm, and saving of the results. It enables systematic and repeatable analysis across various data subsets and conditions.

top10GO_fun <- function(contrast, state) {
  # Extract peaks for the specified state from differential expression results
  peaks <- rownames(top_table_list[[contrast]][top_table_list[[contrast]]$DIFF==state,])
  
  # Retrieve corresponding genes associated with those peaks
  genes_peaks <- cdip_peak_associated_gene %>% 
            filter(peak_id %in% peaks) %>%
            dplyr::select(clodip_v3) %>%
            filter(!is.na(clodip_v3)) %>%
            pull(clodip_v3)
  
  # Prepare gene list for GO analysis
  geneList <- factor(as.integer(geneID %in% genes_peaks))
  names(geneList) <- geneID
  
  # Set up topGO data object for Biological Process ontology
  GOdata <- new("topGOdata", 
                ontology = "BP", 
                allGenes = geneList,
                annot = annFUN.gene2GO, 
                gene2GO = geneID2GO)
  
  # Perform enrichment test using the 'elim' algorithm
  resultTopGO.elim <- runTest(GOdata, 
                              algorithm = "elim", 
                              statistic = "Fisher")
  
  # Extract top results and order them
  topResults <- GenTable(GOdata, 
                         elimKS = resultTopGO.elim,
                         orderBy = "elimKS", 
                         topNodes = 10)
  
  # Save results to a file
  file_path <- paste0(statdir, "BP_top10GO_", contrast, "_" ,state, ".txt")
  write.table(topResults, 
              file = file_path, 
              sep = "\t", 
              quote = FALSE,
              col.names = TRUE, row.names = FALSE)
  
  return(topResults)
}

The code below applies the previously defined top10GO_fun function to conduct enrichment analysis for both ‘open’ and ‘closed’ chromatin states across all contrasts in the dataset. Results are systematically named and stored, facilitating easy access for subsequent analysis.

# Perform GO enrichment analysis for 'open' state across all contrasts
open_results <- lapply(contrasts, function(c) top10GO_fun(contrast = c, state = "open"))
names(open_results) <- contrasts

# Perform the same for 'closed' state
close_results <- lapply(contrasts, function(c) top10GO_fun(contrast = c, state = "closed"))
names(close_results) <- contrasts

This function, plot_GO, is designed to generate plots that visually represent the GO enrichment results. It adjusts data for plotting, uses ggplot2 to create visually appealing graphs, and saves each plot systematically.

plot_GO <-function(topGO_res, contrast, state) {
  
  # Adjust numeric values for plotting
  advanced_as.numeric <- function(x) {
    x <- gsub("<", "", x)
    numeric_values <- as.numeric(x)
    return(numeric_values)
    }
  
  # Extract results for specific contrast and adjust p-values
  topResults <- topGO_res[[contrast]]
  topResults$elimKS <- advanced_as.numeric(topResults$elimKS)
  topResults <- topResults[topResults$elimKS<0.05,]
  topResults$Term <- factor(topResults$Term, levels=rev(topResults$Term))
  
  # Create a ggplot of the enrichment scores
  go_plot <- ggplot(topResults,
                aes(x = Term, y = -log10(elimKS), size = Significant, fill = -log10(elimKS))) +
    geom_point(shape = 21) +
    scale_fill_gradient(high = state_colors[state],
                        low = "white") +
    xlab('') + ylab('Enrichment score') +
    labs(
      title =  paste0('GO enrich. ', contrast, " " ,state, " chromatin"),
    ) +
    theme_classic() +
    theme(legend.position = "none",
          panel.grid.major=element_blank(), 
          panel.border=element_blank(),
          text = element_text(size = 20))+
    coord_flip()
  
  # Save the plot
  save_ggplot(go_plot, 
              paste0("BP_top10GO_", contrast, "_" ,state))
  
  return(go_plot)
}
# Generate and display GO enrichment plots for 'open' state
lapply(contrasts, function(c) plot_GO(open_results, contrast = c, state = "open"))
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

# Generate and display GO enrichment plots for 'closed' state
lapply(contrasts, function(c) plot_GO(close_results, contrast = c, state = "closed"))
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

Session info

sessionInfo()
## R version 4.3.3 (2024-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: elementary OS 7.1 Horus
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
##  [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
##  [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Madrid
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] topGO_2.52.0           SparseM_1.81           GO.db_3.17.0          
##  [4] AnnotationDbi_1.62.2   IRanges_2.34.1         S4Vectors_0.38.2      
##  [7] Biobase_2.60.0         graph_1.78.0           BiocGenerics_0.46.0   
## [10] EnhancedVolcano_1.18.0 dendextend_1.17.1      ggrepel_0.9.5         
## [13] edgeR_3.42.4           limma_3.56.2           scales_1.3.0          
## [16] lubridate_1.9.3        forcats_1.0.0          purrr_1.0.2           
## [19] tidyverse_2.0.0        tidyr_1.3.1            stringr_1.5.1         
## [22] tibble_3.2.1           readr_2.1.5            dplyr_1.1.4           
## [25] ggplot2_3.5.0         
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        viridisLite_0.4.2       farver_2.1.1           
##  [4] blob_1.2.4              viridis_0.6.5           Biostrings_2.68.1      
##  [7] bitops_1.0-7            fastmap_1.1.1           RCurl_1.98-1.14        
## [10] digest_0.6.35           timechange_0.3.0        lifecycle_1.0.4        
## [13] KEGGREST_1.40.1         RSQLite_2.3.6           magrittr_2.0.3         
## [16] compiler_4.3.3          rlang_1.1.3             sass_0.4.9             
## [19] tools_4.3.3             utf8_1.2.4              yaml_2.3.8             
## [22] knitr_1.46              labeling_0.4.3          bit_4.0.5              
## [25] withr_3.0.0             grid_4.3.3              fansi_1.0.6            
## [28] colorspace_2.1-0        cli_3.6.2               rmarkdown_2.26         
## [31] crayon_1.5.2            generics_0.1.3          rstudioapi_0.16.0      
## [34] httr_1.4.7              tzdb_0.4.0              DBI_1.2.2              
## [37] cachem_1.0.8            splines_4.3.3           zlibbioc_1.46.0        
## [40] parallel_4.3.3          XVector_0.40.0          matrixStats_1.3.0      
## [43] vctrs_0.6.5             jsonlite_1.8.8          hms_1.1.3              
## [46] bit64_4.0.5             locfit_1.5-9.9          jquerylib_0.1.4        
## [49] glue_1.7.0              codetools_0.2-19        stringi_1.8.3          
## [52] gtable_0.3.4            GenomeInfoDb_1.36.4     munsell_0.5.1          
## [55] pillar_1.9.0            htmltools_0.5.8.1       GenomeInfoDbData_1.2.10
## [58] R6_2.5.1                vroom_1.6.5             evaluate_0.23          
## [61] lattice_0.22-5          highr_0.10              png_0.1-8              
## [64] memoise_2.0.1           bslib_0.7.0             Rcpp_1.0.12            
## [67] gridExtra_2.3           xfun_0.43               pkgconfig_2.0.3